智能论文笔记

RouteNet-Fermi: Network Modeling with Graph Neural Networks

Miquel Ferriol-Galmés , Jordi Paillisse , José Suárez-Varela , Krzysztof Rusek , Shihan Xiao , Xiang Shi , Xiangle Cheng , Pere Barlet-Ros , Albert Cabellos-Aparicio

分类：人工智能 | 机器学习

2022-12-22

Network models are an essential block of modern networks. For example, they are widely used in network planning and optimization. However, as networks increase in scale and complexity, some models present limitations, such as the assumption of markovian traffic in queuing theory models, or the high computational cost of network simulators. Recent advances in machine learning, such as Graph Neural Networks (GNN), are enabling a new generation of network models that are data-driven and can learn complex non-linear behaviors. In this paper, we present RouteNet-Fermi, a custom GNN model that shares the same goals as queuing theory, while being considerably more accurate in the presence of realistic traffic models. The proposed model predicts accurately the delay, jitter, and loss in networks. We have tested RouteNet-Fermi in networks of increasing size (up to 300 nodes), including samples with mixed traffic profiles -- e.g., with complex non-markovian models -- and arbitrary routing and queue scheduling configurations. Our experimental results show that RouteNet-Fermi achieves similar accuracy as computationally-expensive packet-level simulators and it is able to accurately scale to large networks. For example, the model produces delay estimates with a mean relative error of 6.24% when applied to a test dataset with 1,000 samples, including network topologies one order of magnitude larger than those seen during training.

translated by 谷歌翻译

Semantic Encoder Guided Generative Adversarial Face Ultra-Resolution Network

Xiang Wang , Yimin Yang , Qixiang Pang , Xiao Lu , Yu Liu , Shan Du

分类：计算机视觉 | 机器学习

2022-11-18

Face super-resolution is a domain-specific image super-resolution, which aims to generate High-Resolution (HR) face images from their Low-Resolution (LR) counterparts. In this paper, we propose a novel face super-resolution method, namely Semantic Encoder guided Generative Adversarial Face Ultra-Resolution Network (SEGA-FURN) to ultra-resolve an unaligned tiny LR face image to its HR counterpart with multiple ultra-upscaling factors (e.g., 4x and 8x). The proposed network is composed of a novel semantic encoder that has the ability to capture the embedded semantics to guide adversarial learning and a novel generator that uses a hierarchical architecture named Residual in Internal Dense Block (RIDB). Moreover, we propose a joint discriminator which discriminates both image data and embedded semantics. The joint discriminator learns the joint probability distribution of the image space and latent space. We also use a Relativistic average Least Squares loss (RaLS) as the adversarial loss to alleviate the gradient vanishing problem and enhance the stability of the training procedure. Extensive experiments on large face datasets have proved that the proposed method can achieve superior super-resolution results and significantly outperform other state-of-the-art methods in both qualitative and quantitative comparisons.

translated by 谷歌翻译

Care for the Mind Amid Chronic Diseases: An Interpretable AI Approach Using IoT

Jiaheng Xie , Xiaohang Zhao , Xiang Liu , Xiao Fang

分类：人工智能 | 机器学习

2022-11-08

Health sensing for chronic disease management creates immense benefits for social welfare. Existing health sensing studies primarily focus on the prediction of physical chronic diseases. Depression, a widespread complication of chronic diseases, is however understudied. We draw on the medical literature to support depression prediction using motion sensor data. To connect human expertise in the decision-making, safeguard trust for this high-stake prediction, and ensure algorithm transparency, we develop an interpretable deep learning model: Temporal Prototype Network (TempPNet). TempPNet is built upon the emergent prototype learning models. To accommodate the temporal characteristic of sensor data and the progressive property of depression, TempPNet differs from existing prototype learning models in its capability of capturing the temporal progression of depression. Extensive empirical analyses using real-world motion sensor data show that TempPNet outperforms state-of-the-art benchmarks in depression prediction. Moreover, TempPNet interprets its predictions by visualizing the temporal progression of depression and its corresponding symptoms detected from sensor data. We further conduct a user study to demonstrate its superiority over the benchmarks in interpretability. This study offers an algorithmic solution for impactful social good - collaborative care of chronic diseases and depression in health sensing. Methodologically, it contributes to extant literature with a novel interpretable deep learning model for depression prediction from sensor data. Patients, doctors, and caregivers can deploy our model on mobile devices to monitor patients' depression risks in real-time. Our model's interpretability also allows human experts to participate in the decision-making by reviewing the interpretation of prediction outcomes and making informed interventions.

translated by 谷歌翻译

TSAA: A Two-Stage Anchor Assignment Method towards Anchor Drift in Crowded Object Detection

Li Xiang , He Miao , Luo Haibo , Yang Huiyuan , Xiao Jiajie

分类：计算机视觉 | 人工智能

2022-11-02

Among current anchor-based detectors, a positive anchor box will be intuitively assigned to the object that overlaps it the most. The assigned label to each anchor will directly determine the optimization direction of the corresponding prediction box, including the direction of box regression and category prediction. In our practice of crowded object detection, however, the results show that a positive anchor does not always regress toward the object that overlaps it the most when multiple objects overlap. We name it anchor drift. The anchor drift reflects that the anchor-object matching relation, which is determined by the degree of overlap between anchors and objects, is not always optimal. Conflicts between the fixed matching relation and learned experience in the past training process may cause ambiguous predictions and thus raise the false-positive rate. In this paper, a simple but efficient adaptive two-stage anchor assignment (TSAA) method is proposed. It utilizes the final prediction boxes rather than the fixed anchors to calculate the overlap degree with objects to determine which object to regress for each anchor. The participation of the prediction box makes the anchor-object assignment mechanism adaptive. Extensive experiments are conducted on three classic detectors RetinaNet, Faster-RCNN and YOLOv3 on CrowdHuman and COCO to evaluate the effectiveness of TSAA. The results show that TSAA can significantly improve the detectors' performance without additional computational costs or network structure changes.

translated by 谷歌翻译

EarthNets: Empowering AI in Earth Observation

Zhitong Xiong , Fahong Zhang , Yi Wang , Yilei Shi , Xiao Xiang Zhu

分类：计算机视觉

2022-10-10

Earth observation, aiming at monitoring the state of planet Earth using remote sensing data, is critical for improving our daily lives and living environment. With a growing number of satellites in orbit, an increasing number of datasets with diverse sensors and research domains are being published to facilitate the research of the remote sensing community. In this paper, we present a comprehensive review of more than 400 publicly published datasets, including applications like land use/cover, change/disaster monitoring, scene understanding, agriculture, climate change, and weather forecasting. We systematically analyze these Earth observation datasets with respect to five aspects volume, bibliometric analysis, resolution distributions, research domains, and the correlation between datasets. Based on the dataset attributes, we propose to measure, rank, and select datasets to build a new benchmark for model evaluation. Furthermore, a new platform for Earth observation, termed EarthNets, is released as a means of achieving a fair and consistent evaluation of deep learning methods on remote sensing data. EarthNets supports standard dataset libraries and cutting-edge deep learning models to bridge the gap between the remote sensing and machine learning communities. Based on this platform, extensive deep learning methods are evaluated on the new benchmark. The insightful results are beneficial to future research. The platform and dataset collections are publicly available at https://earthnets.github.io/.

translated by 谷歌翻译

Anomaly Detection in Aerial Videos with Transformers

Pu Jin , Lichao Mou , Gui-Song Xia , Xiao Xiang Zhu

分类：计算机视觉

2022-09-25

无人驾驶飞机（UAV）通过低成本，大型覆盖，实时和高分辨率数据采集能力而广泛应用于检查，搜索和救援行动的目的。在这些过程中产生了大量航空视频，在这些过程中，正常事件通常占压倒性的比例。本地化和提取异常事件非常困难，这些事件包含手动从长视频流中的潜在有价值的信息。因此，我们致力于开发用于解决此问题的异常检测方法。在本文中，我们创建了一个新的数据集，名为Droneanomaly，用于空中视频中的异常检测。该数据集提供了37个培训视频序列和22个测试视频序列，这些视频序列来自7个不同的现实场景，其中包括各种异常事件。有87,488个彩色视频框架（训练51,635，测试35,853），每秒30帧的尺寸为640美元\ times 640美元。基于此数据集，我们评估现有方法并为此任务提供基准。此外，我们提出了一种新的基线模型，即变压器（ANDT）的异常检测，该模型将连续的视频帧视为一系列小管，它利用变压器编码器从序列中学习特征表示，并利用解码器来预测下一帧。我们的网络模型在训练阶段模型正常，并确定了具有不可预测的时间动力学的事件，作为测试阶段的异常。此外，为了全面评估我们提出的方法的性能，我们不仅使用无人机 - 异常数据集，而且使用另一个数据集。我们将使我们的数据集和代码公开可用。可以在https://youtu.be/ancczyryoby上获得演示视频。我们使数据集和代码公开可用。

translated by 谷歌翻译

FuTH-Net: Fusing Temporal Relations and Holistic Features for Aerial Video Classification

Pu Jin , Lichao Mou , Yuansheng Hua , Gui-Song Xia , Xiao Xiang Zhu

分类：计算机视觉

2022-09-22

由于其低成本和快速移动性，无人驾驶汽车（UAV）现在已广泛应用于数据获取。随着航空视频量的增加，对这些视频自动解析的需求正在激增。为了实现这一目标，当前的研究主要集中于在空间和时间维度沿着卷积的整体特征提取整体特征。但是，这些方法受到小时接收场的限制，无法充分捕获长期的时间依赖性，这对于描述复杂动力学很重要。在本文中，我们提出了一个新颖的深神经网络，称为futh-net，不仅为整体特征建模，而且还模拟了空中视频分类的时间关系。此外，在新型融合模块中，多尺度的时间关系可以完善整体特征，以产生更具歧视性的视频表示。更特别地，FUTH-NET采用了两条道路架构：（1）学习框架外观和短期时间变化的一般特征的整体代表途径，以及（2）捕获跨任意跨越任意时间关系的时间关系途径框架，提供长期的时间依赖性。之后，提出了一个新型的融合模块，以时空整合从这两种途径中学到的两个特征。我们的模型对两个航空视频分类数据集进行了评估，即ERA和无人机操作，并实现了最新结果。这表明了其在不同识别任务（事件分类和人类行动识别）之间的有效性和良好的概括能力。为了促进进一步的研究，我们在https://gitlab.lrz.de/ai4eo/reasoning/futh-net上发布该代码。

translated by 谷歌翻译

The Outcome of the 2022 Landslide4Sense Competition: Advanced Landslide Detection from Multi-Source Satellite Imagery

Omid Ghorbanzadeh , Yonghao Xu , Hengwei Zhao , Junjue Wang , Yanfei Zhong , Dong Zhao , Qi Zang , Shuang Wang , Fahong Zhang , Yilei Shi

分类：计算机视觉

2022-09-06

这里介绍了人工智能研究所（IARAI）组织的2022年Landslide4sense（L4S）竞赛的科学结果。竞争的目的是根据全球收集的卫星图像的大规模多个来源自动检测滑坡。 2022 L4S旨在促进有关使用卫星图像的语义分割任务的深度学习模型（DL）模型最新发展的跨学科研究。在过去的几年中，由于卷积神经网络（CNN）的发展，基于DL的模型已经达到了对图像解释的期望。本文的主要目的是介绍本次比赛中介绍的细节和表现最佳的算法。获胜的解决方案详细介绍了Swin Transformer，Segformer和U-NET等最先进的模型。还考虑了先进的机器学习技术和诸如硬采矿，自我培训和混合数据增强之类的策略。此外，我们描述了L4S基准数据集，以促进进一步的比较，并在线报告准确性评估的结果。可以在\ textIt {未来开发排行榜上访问数据，以供将来评估，\ url {https://www.iarai.ac.ac.at/landslide4sense/challenge/}，并邀请研究人员提交更多预测结果，评估准确性在他们的方法中，将它们与其他用户的方法进行比较，理想情况下，改善了本文报告的滑坡检测结果。

translated by 谷歌翻译

Enabling Country-Scale Land Cover Mapping with Meter-Resolution Satellite Imagery

Xin-Yi Tong , Gui-Song Xia , Xiao Xiang Zhu

分类：计算机视觉

2022-09-01

高分辨率卫星图像可以为土地覆盖分类提供丰富的详细空间信息，这对于研究复杂的建筑环境尤为重要。但是，由于覆盖范围复杂的覆盖模式，昂贵的训练样品收集以及卫星图像的严重分布变化，很少有研究应用高分辨率图像来大规模详细类别的覆盖地图。为了填补这一空白，我们提出了一个大规模的土地盖数据集，即五亿像素。它包含超过50亿个标记的像素，这些像素由150个高分辨率Gaofen-2（4 M）卫星图像，在24类系统中注释，涵盖人工结构，农业和自然阶层。此外，我们提出了一种基于深度学习的无监督域适应方法，该方法可以转移在标记的数据集（称为源域）上训练的分类模型，以获取大型土地覆盖映射的无标记数据（称为目标域）。具体而言，我们采用动态伪标签分配和班级平衡策略来介绍一个端到端的暹罗网络，以执行自适应领域联合学习。为了验证我们的数据集的普遍性以及在不同的传感器和不同地理区域中提出的方法，我们对中国的五个大城市和其他五个亚洲国家的五个城市进行了土地覆盖地图，以下情况下使用：Planetscope（3 m），Gaofen-1，Gaofen-1 （8 m）和Sentinel-2（10 m）卫星图像。在总研究区域为60,000平方公里，即使输入图像完全未标记，实验也显示出令人鼓舞的结果。拟议的方法接受了5亿像素数据集的培训，可实现在整个中国和其他亚洲国家的高质量和详细的土地覆盖地图。

translated by 谷歌翻译

HPO: We won't get fooled again

Kalifou René Traoré , Andrés Camero , Xiao Xiang Zhu

分类：机器学习 | 人工智能 | 计算机视觉

2022-08-04

高参数优化（HPO）是一个良好的研究领域。但是，HPO管道中组件的效果和相互作用尚未得到很好的研究。然后，我们问自己：HPO的景观是否会被用于评估单个配置的管道偏见吗？为了解决这个问题，我们建议使用健身景观分析分析HPO管道对HPO问题的影响。特别是，我们研究了DS-2019 HPO基准数据集，寻找可能表明评估管道故障的模式，并将其与HPO性能联系起来。我们的主要发现是：（i）在大多数情况下，大量不同的超参数（即多种配置）产生相同的不良绩效，很可能与多数类预测模型有关；（ii）在这些情况下，观察到观察到的健康和平均健身之间存在恶化的相关性，可能会使基于本地搜索的HPO策略的部署更加困难。最后，我们得出的结论是，HPO管道定义可能会对HPO景观产生负面影响。

translated by 谷歌翻译